Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
نویسندگان
چکیده
منابع مشابه
Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming
Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iter...
متن کاملModified policy iteration algorithms are not strongly polynomial for discounted dynamic programming
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms a...
متن کاملSmooth Value and Policy Functions for Discounted Dynamic Programming
We consider a discounted dynamic program in which the spaces of states and actions are smooth (in a sense that is suitable for the problem at hand) manifolds. We give conditions that insure that the optimal policy and the value function are smooth functions of the state when the discount factor is small. In addition, these functions vary in a Lipschitz manner as the reward function-discount fac...
متن کاملConstrained Discounted Dynamic Programming
This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semi-continuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a fe...
متن کاملAn Efficient Policy Iteration Algorithm for Dynamic Programming Equations
We present an accelerated algorithm for the solution of static Hamilton-JacobiBellman equations related to optimal control problems. Our scheme is based on a classic policy iteration procedure, which is known to have superlinear convergence in many relevant cases provided the initial guess is sufficiently close to the solution. This limitation often degenerates into a behavior similar to a valu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2012
ISSN: 0364-765X,1526-5471
DOI: 10.1287/moor.1110.0532